Even a well-designed and operated Exchange system
will eventually experience problems that you need to identify and
repair. The previous section explained the troubleshooting methodology.
This section highlights some of the top tools available for
troubleshooting.
1. Identifying and Resolving Performance Problems
The Exchange Management
Console (EMC) ships with a collection of troubleshooting and diagnostic
tools. A number of the tools are hosted within the EMC, whereas some,
such as the Best Practices Analyzer, are separate executables that can
be launched from within the toolbox or have to be installed separately.
This section discusses a few of the top tools in more detail. You can also use a number of additional tools. Table 1 lists these tools and their functions.
Table 1. Troubleshooting Tools
TOOL | DESCRIPTION |
---|
DNSLint | DNSLint can be used to help diagnose common DNS configuration errors across multiple DNS servers.
DNSLint is useful for identifying connectivity issues. |
Error Code Lookup (Err.exe) | Use for Error Code Lookup to determine error values from decimal and hexadecimal error codes.
Use in conjunction with error codes. |
Event Viewer (eventvwr.msc) | An MMC snap-in to view logged events.
Use Event Viewer in any case as a starting point. |
LDP (ldp.exe) | A
GUI tool used to perform LDAP operations (connect, bind, search,
modify, add, delete) against Active Directory or LDAP compatible
directory. LDP can display all property information about objects in
the directory.
LDP can help identify user configuration issues. |
Process Monitor | A monitoring tool for Windows that shows real-time information from the file system, registry, and process/threads.
Process Monitor is helpful for diagnosing performance issues. |
Microsoft Product Support Reports | The Microsoft Product Support Reports tool gathers critical system and logging information useful for troubleshooting support issues.
This tool is useful for general data gathering.
Under most circumstances Microsoft PSS will send the customer a link to run the tool. |
1.1. Microsoft Exchange Best Practices Analyzer
The Microsoft Exchange Best Practices Analyzer (ExBPA) is a tool to help administrators assess the health of their servers and topology. The tool scans the live environment
and compares the results against a vast list of best practices defined
by Microsoft. It pulls information from Active Directory, the registry,
WMI,
the IIS metabase, and Performance Monitor. Additionally, it collects
useful information about the Exchange organization. Because it is a
stand-alone tool, it has its own help file and is not integrated into
the standard Exchange help. You have a lot of flexibility when running
the tool. In most cases ExBPA will also display information on how to
correct the identified issue. You can scan the entire Exchange
organization or scope it down to a single server. Several types of
scans are available, as shown in Table 2.
Table 2. ExBPA Scan Types
SCAN TYPE | SCAN ACTIONS | WHEN TO USE |
---|
Health Check | Performs
a full scan checking for errors, warning, non-default configurations,
and configuration changes. Optionally, it can take samples of
performance counters over a two-hour period. | Use this scan type to check the health of the organization or to troubleshoot a specific problem. |
Permissions Check | Scans Active Directory domain naming context and the Exchange configuration naming context. | Run this scan if you suspect a permissions access issue. |
Connectivity Check | This scan tries to validate all network connectivity and Active Directory access. | This scan helps troubleshoot network connectivity issues. It can be very useful if you have firewalls in the topology. |
Baseline | The Baseline check allows an administrator to set threshold values that will be checked against the server's actual configuration. | Useful to report on deviations from baseline values. |
One interesting way to use ExBPA
is to compare information between multiple runs to see what
configuration items have changed. If an issue arises, it is easy to
compare a new report against a known good baseline and quickly spot any
differences.
1.2. Microsoft Exchange Performance Troubleshooter
The Microsoft Exchange Performance Troubleshooter (ExPTA) helps locate performance related issues. As you can see in Figure 1,
an administrator selects the type of performance symptoms he is seeing.
RPC issues generally affect client performance, so RPC issues can be a
common area administrators need to troubleshoot.
The ExPTA
collects configuration information and live performance counter
information to analyze each subsystem to determine bottlenecks that can
affect RPC calls. For example, it collects disk, memory, LDAP, and
event viewer data in its analysis. Like the ExBPA tool, the ExPTA also
makes recommendations on how to correct any issues it identifies. Keep
in mind that it is best to run this tool while experiencing performance
issues. Although you can check all of the performance counters
manually, this tool greatly speeds up the troubleshooting process.
1.3. Exchange Profile Analyzer
The Exchange Profile Analyzer (ExPA) collects statistical information from a single- or multiple-mailbox
database across the Exchange organization. The reports include detailed
information such as the average message size, how large mailboxes are,
message counts, and recipient information. This tool mainly helps with
capacity planning. For example, when planning an Exchange Server 2010
Mailbox server, the tool can gather information that can be later used
to complete the Exchange Mailbox Role Calculator spreadsheet.
Also included in the
installation package is the OWA Profile Analyzer. This tool reports on
information such as logon/logoffs and detailed mail operations. Again,
this is useful for reporting on trending information and capacity
planning. Instead of guessing how many users actually use OWA, you can
get solid numbers for reporting.
1.4. Client-Side Issues
Service Pack 1 introduces a new cmdlet for fixing mailbox issues named repair-mailbox. This cmdlet can be used to detect and fix the following types of mailbox corruptions:
Search folder corruptions
Aggregate counts on folders not reflecting correct values
Views on folders not returning correct contents
Provisioned folders incorrectly pointing into unprovisioned parents or vice versa
When running this task with the database online, mailbox access will be disrupted only for the mailbox that is being repaired. All other mailboxes on the database or server will still be operational.
Another useful pair of troubleshooting tools are built right into Outlook 2007 and 2010. The Connection
Status dialog box shows the client's current connection status and
useful information such as the response time and request failures.
Occasionally when the network status changes, Outlook fails to
reconnect automatically. This sometimes occurs when switching between a
wired and wireless connection or enabling a connection to a corporate
VPN. If Outlook does not automatically reconnect, a user can click the
Reconnect button forcing Outlook to attempt to restore the client's
server connection. The second tool is the Test
E-mail AutoConfiguration tool. This is useful for diagnosing issues
with AutoDiscover or the services that AutoDiscover returns.
2. Identifying and Resolving Mail Flow Issues
A number of tools are available to help you troubleshoot mail flow issues. The queue viewer, mail flow troubleshooter, and transport logs all help identify the problem.
2.1. Transport Logs
Sometimes the administrator needs detailed information to troubleshoot mail flow issues. Fortunately, Exchange provides multiple ways to get behind-the-scenes information to troubleshoot root cause. Table 3 shows a summary of the different logs available within Exchange.
Table 3. Transport Log Summary
LOG | DETAIL | USED FOR |
---|
Connectivity Logs | Records connection activity of outbound message delivery queues. | Troubleshooting problems with messages reaching their destination Mailbox server, Send connector, or domain. |
Protocol Logs | Tracks
SMTP communication between Exchange servers as part of message routing
and delivery. Other protocols can be enabled, such as POP, IMAP, and HTTP. | Troubleshooting message delivery from Send and Receive connectors. |
Routing Table Log | A snapshot of the message routing table used by the Hub Transport or Edge Servers. | Troubleshooting internal message delivery. |
Message Tracking Logs | Track the flow of messages between servers. | Troubleshooting message delivery and determining the status of a message. |
Agent Logs | Records actions performed on messages by specific anti-spam agents on Edge or Hub Transport servers. | Troubleshooting messages that have been acted upon by anti-spam agents. |
2.1.1. Connectivity Logs
The connectivity logs can be
used to get connectivity information from the Hub Transport servers and
Edge servers with their destination servers. The information in the log
is detailed with connection information and is helpful in
troubleshooting outbound mail flow issues. The connectivity logs do not contain message information, only information from the mail flow process.
2.1.2. Protocol Logs
The protocol logs are
disabled by default, and can be enabled or disabled on a per-connector
basis. A number of settings are configured on the connector and some
are set on the server and apply across connectors.
Protocol logs should only be
enabled when you are troubleshooting because they can impact the
performance of the server. By default, the logs will only consume 250
MB of disk space, but this may need to be increased because the logs
can grow quickly. Microsoft IT notes that they capture between 5 and 15
GB of protocol logs per day on their Edge servers.
2.1.3. Routing Table Log
The Routing
Table Log is enabled by default. The table is recalculated and logged
after a routing change or every 12 hours by default. The Microsoft Exchange Transport Service is responsible for this log, and runs on every Hub Transport or Edge server. The Routing Table Log viewer is located in the EMC toolbox, and can be used to read local or remote logs.
The routing log can be used to validate Active Directory's
configuration information. Within the log is information on site and
routing groups, servers, Send connectors, and address spaces.
2.1.4. Agent Logs
The agent
logs can be useful for an administrator who wants to understand why an
action was taken on a message because of the anti-virus agents running
on the Edge or Hub Transport servers. The following agents can write
information to this log:
Connection Filter Agent
Content Filter Agent
Edge Rules Agent
Recipient Filter Agent
Sender Filter Agent
Sender ID Agent
The information written to the log depends upon which agent and action was performed.
2.1.5. Message Tracking Logs
A very common scenario for administrators is tracking
down message delivery. Users report that they sent a message and it was
never received, or that they were expecting a message and it never
arrived. Exchange 2010 provides a new feature called Delivery Reports that allows users and administrators to easily retrieve transport information about messages. Delivery
reports will help answer questions about whether or when messages were
delivered by providing the following information based on role-based
access security. The information is listed in Table 4. This table shows how security rights affect what information is available in a report or even what report is available.
Table 4. Delivery Report Information
EVENT | MANAGEMENT ROLE (ROLE GROUP) |
---|
E-mail Submission from the Sender's Mailbox | MyBaseOptions (Default)
Message Tracking (Organization Management)
View-Only Recipients (Help Desk) |
Group Expansion | MyBaseOptions (Default)
Message Tracking (Organization Management)
View-Only Recipients (Help Desk) |
Delivery Success | MyBaseOptions (Default)
Message Tracking (Organization Management)
View-Only Recipients (Help Desk) |
Delivery Failure | MyBaseOptions (Default)
Message Tracking (Organization Management)
View-Only Recipients (Help Desk) |
Inbox Rules | Message Tracking (Recipient Management)
Message Tracking (Organization Management)
View-Only Recipients (Help Desk) |
Transport Rules | Message Tracking (Organization Management)
View-Only Recipients (Help Desk) |
Message was read (if enabled) | MyBaseOptions (Default)
Message Tracking (Organization Management)
View-Only Recipients (Help Desk) |
Hub Transfers | Message Tracking (Organization Management)
View-Only Recipients (Help Desk) |
Transfer to External Servers | MyBaseOptions (Default)
Message Tracking (Organization Management)
View-Only Recipients (Help Desk) |
Transfer to Older Versions | MyBaseOptions (Default)
Message Tracking (Organization Management)
View-Only Recipients (Help Desk) |
Moderation | MyBaseOptions (Default)
Message Tracking (Organization Management)
View-Only Recipients (Help Desk) |
Users can access delivery reports from OWA by clicking the Options button, opening the Exchange control panel, clicking the Organize E-Mail tab and then selecting the Delivery
Reports option. Additionally, right-clicking any message in OWA will
display the Open Delivery Report option. Administrators can access Delivery
Reports from the Exchange Control Panel on the Reporting tab, with
PowerShell cmdlets, or within the Exchange Management Console in the
Toolbox Message Tracking application. An example of a delivery report
is shown in Figure 2.
The Delivery Report tool uses data from the message tracking logs, which by default keep this data for two weeks. It is important to configure Message Tracking to match the log file data for the same length of time—tracking depends on the log data being available at each hop. If a mailbox is moved to a different server, message tracking can no longer follow the path of the message and may fail. Thus, Delivery
Reports are only available for the messages in a mailbox that was
generated on the server where it is currently located. The Delivery
Report tracking works by the method illustrated in Figure 3 and the following steps.
ECP calls the Search-MessageTrackingReport task with the parameters of the search.
The Search-MessageTrackingReport task locates the sender's Mailbox server.
The Log Search Service on Mailbox Server1 is queried to determine the message's next hop.
The Log Search Service on Hub Transport1 is queried to determine the message's next hop.
Tracking determines that the message crossed the forest/site boundary.
Tracking next contacts Client Access Server2 via EWS in the remote site.
Client Access Server2 queries the Log Search Service on Hub Transport2.
The Log Search Service on Mailbox Server2 is queried.
Delivery status information is returned to Client Access Server2.
Client Access Server2 returns delivery status information to Client Access Server1.
The task merges all of the results and returns them to the user.
Service Pack 1 also provides the ability to track a message after it has been queued
for delivery. Users and Exchange Server Administrators can now be
informed in case of a delay that a message will not meet a delivery SLA.
2.2. Managing Queues
Queues
are a necessary part of transport. Queues allow organizations to not
necessarily architect solutions around peak traffic, which can be
costly. For example, if traffic spikes once a month during regular
business cycles, it may not make sense to build a platform that during
non-peak periods is severely underutilized. Queues also help with
taking the responsibility of redelivery when the remote server is not
responding. In any case, it is important to monitor the queues
for unexpected activity, which may indicate a problem. The Exchange
Management Shell (EMS) and Exchange Management Console (EMC) have
interfaces to view the status and contents of these queues, and also the ability to perform actions on the messages or queues.
Exchange actually uses several queues during normal mail transport. Like the other troubleshooting tools, the Queue Viewer is located in the EMC in the toolbox. Figure 4 shows the Queue Viewer Console.
The Queue
Viewer will open the local transport database if one is available, but
it can connect to any Hub transport database. Edge Transport servers,
on the other hand, can only view their local transport database. The
console is fairly basic and has tabs that display the available queues,
or the messages contained within that queue. Clicking the Create Filter
button allows an administrator to view only queues that match the
filter conditions. For example, Figure 5
shows a filter that when applied will only show queues in a suspended
state. This is very helpful when there are many queues and it is
difficult to find the information you are looking for.
Selecting a queue
in the main window will create another tab where filters can be
applied. You can also apply filters to the Messages tab. The filter for
messages includes the ability to filter out subjects, specific source
IP addresses, and even filters based on SCL value. The message view
also shows the current status of the message. The status can be one of
the following:
Active
If in the delivery queue, the message is being delivered to the next
hop. If in the submission queue, the categorizer is processing the
message.
Pending Remove
An administrator has removed the message—it is already in the delivery
queue. The message will be deleted if it reenters the queue because of
an error, but will be delivered otherwise.
Pending Suspend
An administrator has suspended the message, but it is already in the
delivery queue. The message will be suspended if it reenters the queue
because of an error, but will be delivered otherwise.
Ready The message is waiting to be processed.
Retry The message could not be delivered during the last attempt. Transport will attempt redelivery of the message.
Suspended
The processing of the message has been suspended and no further actions
will be taken on the message until an administrator resumes the message.
2.3. Message Latency
A frequent question
when message tracking is "Why did the message take so long to be
delivered?" For example, a user reports a message took over an hour to
be delivered. The scripts directory includes a script that converts raw
latency information into human-readable form. From within the scripts
directory, run the following cmdlet:
Get-messagetrackinglog -messageid:"<7590C0B7CDB495033BF129504CE4859002394BCB831210
[email protected]>" | ? {$_.MessageLatencyType -eq 'EndToEnd'} |
ConvertTo-MessageLatency | FT -a ComponentServerFqdn,ComponentCode,ComponentLatency
The preceding command produces the following output:
ComponentServerFqdn ComponentCode ComponentLatency
------------------- ------------- ----------------
MBX01.contoso.com TOTAL 00:00:01
MBX01.contoso.com MSSN 00:00:01
HUB01.contoso.com TOTAL 00:00:09
HUB01.contoso.com SDS 00:00:05
HUB01.contoso.com CAT 00:00:01
HUB01.contoso.com SDD 00:00:01
This shows that the Hub Transport and Mailbox
servers handled delivery with a latency of about 9 milliseconds. In
this case, the latency exists outside of the Exchange organization.
This data is exposed for monitoring purposes through the MSExchange Transport Component Latency
performance counter. This counter provides the latency attributed to
specific instances of the object, such as the submission queue,
delivery queue, and the categorizer. The object provides latency
information according to fiftieth, eightieth, ninetieth, ninety-fifth,
and ninety-ninth percentile of messages processed over the last
5-minute intervals. For example, if 99 percent of 100 messages
processed in the last 5 minutes had a latency of 50 seconds or less,
the Percentile 99 counter for the Total Server Latency instance would
be 50.
In this example, SMTP
servers are inside the organization, but not part of the Exchange
organization. It is possible to include non-Exchange servers in the
latency calculations. Add the IP range to the InternalSMTPServers property using the Set-TransportConfig cmdlet. The External Servers instance is also included on the perfmon object.
2.4. Mail Flow Troubleshooter Tool
Another tool available to Administrators for troubleshooting mail flow related issues is the Mail
Flow Troubleshooter. The troubleshooter can be found with the other
troubleshooting utilities in the Toolbox in the EMC. The tool can help
with a wide variety of scenarios. Figure 6
shows the tool's initial page. Depending on which symptoms you see, the
tool will require different information to automatically diagnose the
data. The tool will present an analysis of the possible root causes and
suggests corrective actions.